MIXING TIMES FOR THE SWAPPING ALGORITHM ON THE 
BLUME-EMERY-GRIFFITHS MODEL 
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Abstract. We analyze the so called Swapping Algorithm, a parallel version of the 
well-known Metropolis-Hastings algorithm, on the mean-field version of the Blumc- 
Emery- Griffiths model in statistical mechanics. This model has two parameters and 
depending on their choice, the model exhibits either a first, or a second order phase 
transition. In agreement with a conjecture by Bhatnagar and Randall we find 
that the Swapping Algorithm mixes rapidly in presence of a second order phase 
"^j" ■ transition, while becoming slow when the phase transition is first order. 
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Ch 1 1. Introduction 

Simulation methods are important tools in applied mathematics, e.g. in Bayesian 
statistics, computational physics, econometrics, or computational biology. Markov 
Chain Monte Carlo (MCMC, for short) methods on the other hands belong to the 
most popular simulation techniques. They sample an unknown distribution, rely on 
the ergodic theorem for Markov chains, and construct a Markov chain on a finite 
state space that converges to the desired distribution. The first question is, of course, 
whether such a Markov chain exists. This is answered in the affirmative by the 
Metropolis-Hastings chain: Given an irreducible, aperiodic Markov chain (the base 
chain) on the underlying state space, the Metropolis-Hastings algorithm allows to 
sample from a Markov chain with any given invariant distribution with full support. 
The idea of the Metropolis-Hastings algorithm, to always accept states with a higher 
probability than the current state and to accept states that are less likely with a 
probability equal to the ratio of the probability of the new state and the probability 
of the current state, is borrowed from the Glauber dynamics in statistical physics. In 
situations where the measure we want to sample from is a Gibbs distribution, as is 
often the case in statistical mechanics, the operation of comparing two probabilities 
can be performed quickly, i.e. with a small number of steps. 

Like the Glauber dynamics the Metropolis-Hastings algorithm usually converges 
slowly, when the target distribution is multi-modal, i.e. when there are states that 
are locally very likely but globally not optimal. Such situations occur e.g. in sta- 
tistical physics in the presence of a phase transition and the slow convergence of 
the Glauber dynamics to the equilibrium distribution there is known under name of 
"metastability". 

Several modifications of the Metropolis-Hastings algorithm have been proposed to 
circumvent this problem and speed up the convergence. Among them the so-called 
Swapping Algorithm (see [H]), also called Metropolis-coupled Markov chains or Par- 
allel Tempering (see [21]), an d the Simulated Tempering Algorithm (see (23], [15] . 
and [20J) are very popular in applications, in particular on models from statistical 
physics. In many situations they seem indeed to be able to improve the convergence 
of the Metropolis chain. However, the theoretical results about these algorithms are 
rather limited: Madras and Zheng [22] were able to show that the Swapping chain 
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converges quickly for the Curie- Weiss model (among others). On the other hand, 
relying on results from Zheng's Ph.D. thesis ( [29J ) , Bhatnagar and Randall j2] prove 
that both, the Swapping Algorithm and Simulated Tempering, are slowly mixing for 
the 3-state Potts model and conjecture that this is caused by the first order phase 
transition in the Potts model (while the phase transition in the Curie- Weiss model is 
of second order). The techniques of these two papers were generalized to a couple of 
interesting situations by Huber, Schmidler, and Woodard, see [27] and [28J. A first 
rapid convergence result for the Swapping Algorithm in a disordered situation was 
proved by Lowe and Vermet in [19]. Ebbers and Lowe [X\ show that in disordered 
models the conjecture by Bhatnagar and Randall is not correct. They prove that the 
Swapping Algorithm mixes slowly on the Random Energy Model, even though this 
model has only a third order phase transition. This, however, may actually be a true 
disorder phenomenon, since in the theory of spin glasses, free energies are usually 
smoothed by taking expectations over the disorder. 

The aim of the current paper therefore is to analyze the conjecture of Bhatnagar 
and Randall in another ordered model. A very appropriate scenario for this purpose 
is the mean-field version of the so called Blume-Emery-Griffiths (BEG, for short) 
model. This model resembles a Curie-Weiss model with three states, ±1 and 0. 
However, unlike in the Potts model, the state plays a particular role. The BEG 
model has been studied extensively as a model of many diverse systems, in particular 
He 3 — He 4 mixtures. A fact that makes it particularly interesting for our purposes 
is that, for different parameter values, it exhibits both, a discontinuous first-order 
phase transition and a continuous second order phase transition. This behavior has 
been conjectured for quite some time in the physics literature, but only recently was 
rigorously shown to be true in a paper by Ellis et al. [11] . One reason, why the mean- 
field version of the BEG model is mathematically challenging, is based on the fact, 
that even though the energy functions depends on a two dimensional parameter, the 
coordinates of this parameter are not independent. Other results on the BEG model 
were obtained by Ellis et al. in subsequent papers ([3], [9], [ID]), where the mean- field 
BEG model was referred to as mean-field Blume-Capel model. The Glauber dynamics 
for this model was studied in a very recent paper by Kovchegov, Otto and Titus [T7] . 
They show that the mixing times of the Glauber dynamics undergoes a transition 
from rapid to slow mixing depending on the parameter values; the fascinating aspect 
of this result is, that the mixing time transition coincides with the equilibrium phase 
transition in the regime of the second order continuous phase transition but differs 
in the regime of the first-order discontinuous phase transition of the BEG model. 
In the present paper, we consider the Swapping and Simulated Tempering Algorithms 
for the BEG model in regimes where the model is multimodal and confirm the conjec- 
ture by Bhatnagar and Randall in so far, that we are able to show rapid convergence 
(i.e. convergence in polynomial time in the system size) and torpid mixing (i.e. con- 
vergence in exponential time) depending on whether there is a second or a first order 
phase transition in the model. 

As mentioned before, Woodard, Schmidler and Huber [27] were able to give the first 
known result of rapid mixing of the Swapping Algorithm in a general, non model- 
specific, setting, in particular also to situations where the target distribution has 
more than one mode. We note that their result are so general, that they cannot be 
used in the case of rapid mixing in the BEG model. The technique used by Woodard, 
Schmidler and Huber relies heavily on a static, non temperature-dependent, parti- 
tioning of the state space. The underlying Metropolis chain needs to mix rapidly in 
each part, for any temperature, in order for their technique to work. Furthermore, 
the probability of each part must not get too small, as the temperature is decreased. 
In the rapid mixing case of the BEG model, this partitioning cannot be achieved. 
Our proof relies on a dynamic, temperature dependent, partitioning in which one 
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part gets very unlikely as the temperature is decreased. More precisely, the parti- 
tioning necessary for proofing rapid mixing as stated in Theorem 13.41 below is given 
in our formula (I4.33P which uses the division of the state space for every temperature 
introduced in (I4.23P through ( I4.25p . The necessity arises as there is no temperature 
independent partitioning such that the Metropolis chain itself is rapidly mixing for 
every partition and for every temperature. Additionally as defined in ( I4.39P below it 
is even necessary to switch temperature dependently from one partition to two parti- 
tions (per total magnetization direction) at the critical temperature. This technique 
is indeed tailored for the bimodal situation of the BEG. 

We organize the paper in the following way: The second section introduces the Swap- 
ping Algorithm (based on the Metropolis-Hastings chain) formally. At the same time 
we also introduce the Tempering Algorithm, which is itself uninteresting for appli- 
cations in statistical mechanics, but provides a chain, that can be compared to the 
Swapping Algorithm, in particular when both algorithms are slowly mixing. In Sec- 
tion 3 we introduce the mean-field BEG model. We propose a way to rewrite this 
model, present a theorem on the free energy which is a refinement of some results 
given in [TT], and is necessary for our analysis of the Swapping Algorithm. Then 
we give our results on the Swapping and Tempering Algorithms - a characterization 
of the parameter regimes where these Algorithms converge rapidly or slowly, respec- 
tively. These results are proved in Section 4 and 5, respectively. The proofs use 
methods to bound the spectral gaps of Markov chains such as coupling methods or 
Poincare inequalities. In the appendices, we cite those bounds we need in the proofs. 
Moreover, we prove a result on the speed of convergence of a coloring algorithm on 
a graph and our results on the free energy in the BEG model. These lemmata turn 
out to be useful in the proofs of our results in Section 4 and 5. 



In this section we introduce two variants of the Metropolis-Hastings Algorithm. These 
algorithms include an additional change of temperature with the idea to speed up 
the Metropolis chain, when it is slow. They are specifically tailored for situations, 
where the invariant measure is a Gibbs measure with respect to some energy func- 
tion and the Metropolis Algorithm mixes slowly at low temperatures, but quickly at 
high temperatures. We start with the Simulated Tempering Algorithm proposed by 
Marinari and Parisi |23j . 

2.1. Simulated Tempering. From now on and for the rest of the paper let us 
assume that the target distribution is a Gibbs measure on a finite set Q. To be more 
specific, let H(-) denote an energy function or Hamiltonian of the system. For every 
inverse temperature (3 > 0, the probability function on Q given by 



is called a Gibbs measure. Note that the sign of our energy function differs from the 
conventional choice in statistical mechanics. For the sake of this paper we will be 
concerned with simulating such Gibbs measures. 

Let K, gen denote an aperiodic, symmetric and irreducible Markov chain on Q, the 
so-called base chain, and Tp(-,-) the corresponding Metropolis-Hastings chain for np 
defined by 



2. Simulated Tempering and Swapping 



irp(a) : 



e /3H(a) e PH(a) 



(2.1) 




1 - J2 z ^x Tp(x, z) otherwise. 



if x 7^ y and H(y) > H(x) 
if x y and H(y) < H{x) 



(2.2) 
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For Gibbs measures on a finite set with some sort of neighborhood structure, one 
commonly chooses K, gen as a local random walk kernel. This algorithm, despite of 
being natural, is sometimes slow in natural situations, e.g. when sampling from the 
low temperature distribution of the Curie- Weiss model (see e.g. [21] )■ To speed up 
its convergence, we consider Q x {0, 1, M} for some MeN. In the case of Gibbs 
measures on a set Q = S , where S is some set with more than one element and 
N is a large natural numvber, M will be typically chosen as M : = C\N for some 
constant C\ > 0. The second component of the new state space refers to the current 
temperature of the model (or the chain, resp.). Define 

Pi := —(3 and the probability measures 7T; := 7TQ r (2.3) 

As probability measure on Q x {0, M} we take 

7 r(x) = n((a,t)) = j^7r l (a). (2.4) 

We construct a Markov chain that starts in G Q x {0, 1,...,M} and chooses 

a new state (cr',i) according to T^. In a second step the temperature is changed 
according to a similar Metropolis chain. The idea is, that in case the chain is in an 
energy-valley, it can increase its temperature (reduce (3) and thereby reduce the cost 
of switching to another energy- valley. Explicitly, this works as follows: 
In the first step let i G {0, ...,M} be fixed. Then a transition from (er, i) to {a 1 , i) 
has probability P st ((a,i), (cr',i)) '■= T^(a,a'). In the second step let a G Q be fixed. 
Then the chain moves from (a,i) to (a, j) according to the transition probabilities 

K tm (i,j) if iTj(a) > iXi(o) and i ^ j 

Q{(a,i),{aJ)) := { K Uhj)^ if < 



£Q((<M),(o-,fc)) ifi = j 



if j = i±l andj G {0, . . . , M} 
K tm (i,j):= I _ . . if \i-j\>l 




with 



if i = j. 

The actual Simulated Tempering Algorithm now consists of first applying a temper- 
ature move Q, then a Metropolis move at the present temperature (the transition 
matrix of which is denoted by T), and finally another temperature move. Hence, in 
terms of transition matrices the Simulated Tempering algorithm is given by QP st Q. 
Notice that the computation of -^^y in the matrix Q needs knowledge of the normal- 
izing constants Z(/3i) and Z(j3j) which in most cases is hard to obtain. This is the 
reason for introducing the following Swapping Algorithm. 

2.2. Swapping. The so called Swapping Algorithm was suggested by Geyer in [T4] . 
The basic idea of changing the temperature is maintained. As state space for the 
Swapping chain we choose: 

A natural choice for a probability measure on f2 sw is: 

M .„ 



m n 



vr(x):=n^(^) = ^ (2-5) 

n m) 

8=0 



THE SWAPPING ALGORITHM ON THE BLUME-EMERY-GRIFFITHS MODEL 5 



with x = (xo, ...,xm) £ ^ SW - As in the Simulated Tempering Algorithm the Swapping 
Algorithm consists of two steps. In the first step, we choose an i G {0, ...,M} uni- 
formly and update the i-th component of the current state x = (xq, ...,xm) according 
to the usual Metropolis chain T@. at inverse temperature In the second step we 
choose an i 6 {0, M — 1} uniformly at random and swap the components Xi and 
Xi + i of x with probability 



mm 



7T 



(^0, 



71 (xq, ...,Xi, Xj+i, Xm) 



So explicitly the first step works as follows: The transition probabilities from x = 
(x , x^x, x h x i+ i, x M ) G ^ sw tox' = (x , ...,Xi-i,Xi,Xi + i, ..,%) areTi(x,x') : = 
Tfyfai,^). For any u, v, let 5(u, v) — 1 if u = v and otherwise. Then the product 
chain 



P(x,y) = -S(x,y) H — — — 

v >y; 2 v ,yj 2 (M + 1) 



M 



^2s(x ,y ) <5(xi_i,?/i_i)ri(xi,?/i) 

x S(x i+1 , y i+ i) • ... • S(x M , Vm) (2.6) 



i=0 



gives us a Markov chain on f2 sw . Also note that we never change more than one 
component at a time. The second step is the temperature swap. Here the transition 
probabilities from x = (x , Xj, x i+ i, xm) to x' = (x , x i+ i, x i: xm) are 



Q(x,x') 
K sw is defined by 



-K"sw(-£? X ) 

K ( x x ')lKl 

i - Yl Q( x , z ) ^ x 

ZJ^X 



if tt(x') > tt(x) and x ^ x' 
if ir(x') < ir(x) 



x' 



K sw {x, X 



1 

2M 





if 3i with Xj = x'a Vj ^ {i,i + 1}, 



and Xi 



x. 



x i+l 



X; 



if $i with Xj = x'j Vj ^ {i, 2 + 1}, 



1 - J2 K sw (x,z) if x 



x 



Note that the factor | in the definition of i£g W and P guarantees that both, P and Q, 
are aperiodic and that the corresponding operators are positive. Notice that all the 
normalizing constants in Q and P cancel out, such that the transition probabilities 
can be effectively computed. 

The Swapping Algorithm is now any reasonable combinations of P and Q, usually 
one takes QPQ as it is reversible with respect to tt if Q and P are reversible (which 
in our situation is the case). The following theorem gives an idea, how the speed of 
convergence of swapping and tempering depend on each other. 

Theorem 2.1 (Zheng [30J ) . If there exists a constant 5 > such that 

minl^x), 7r i+ i(x)} > 5 for all 1 < i < M, 

xefi 



then if the Swapping Algorithm converges in polynomial time, so does the Simulated 
Tempering Algorithm. 
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3. Results 

Now we introduce the mean field Blume-Emery-Griffiths (BEG) model. For a given 
K > the Hamilton function on Q = { — 1,0,1} N is given by 

N K fJL V 

H{a) = H K (a) := - £ a] + - £ a A (3.1) 

j=i \j=i J 

for a G Q. Here a state cr is said to have spin <7j in coordinate i. Therefore the Gibbs 
measure of the BEG model, which we want to sample from, is 

e (3H(a) e /3H(<r) e /3(-Ef=i^l+f (Ef=i^) 2 ) 

Ma) = ~W) = S ema,) = S e ^ (ff,) (3,2) 

with Z(/3) being the normalization constant. We see, that in the mean-field BEG 
model, the energy function solely depends on the parameters ^ J=1 cr|, and (X)j=i °j) 2 > 
the last one being the term of interactions between spins. It can therefore be expected, 
that the mean-field BEG can be rigorously analyzed. However, as the two parameters 
are strongly dependent, the analysis is not easy. It was not until the paper by Ellis 
et al. [H] , that one obtained a thorough understanding of the macroscopic behavior 
of the mean field BEG model. In a nutshell their result coincides with an intuitive 
understanding of the model. If K is large enough, the second term becomes domi- 
nant and the model behaves like the Curie- Weiss model (see [8] for an analysis of the 
latter model): it has a second order phase transition at some critical temperature 

PP(K). When K becomes smaller, this phase transition however is of first order, 
the low temperature macro-states emerge discontinuously from the high-temperature 
macro-state. If K is eventually too small, there is no phase transition at all. 
We will first do some system specific preparations, in order to get more familiar with 
the model. To simplify notation define the functions 

N 

S N (a) = ^2ai (3.3) 

i=l 
JV 



R N (a) = J2^ (3-4) 



i=l 

where SV gives the total magnetization, and Rn the total number of non-zero spins 
of the state a. Using this notation we define 

A s ,r '■= {cr G f2|Sjv(er) = s,i?Ar(cr) = r} (3.5) 

as the set of states with a fixed number of Os and fixed magnetization. As we consider 

the mean-field BEG model, all states in A s , r are basically indistinguishable in the 

system. We will later (see Theorem 14.71 below) see, that the Metropolis chain T 2 

restricted to A s>r mixes rapidly for any combination of s and r. 

In order to be able to better address non-negligible differences in the state space 

consider 

T = T N := {a = (o_i,a ,ai) G M 3 ^ > Vz,^^ = l,JVo< G N Vi = -1,0,1} 

(3.6) 

such that 

tt=\J |(T G dj^o^z) = Ncii Vi G {-1,0,1}} (3.7) 

aeT j=l 

is a disjoint union. Note that all states in one of the sets on the right hand side of 
(13.71) only differ by an index permutation and thereby have the same energy. This 
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is inspired by Gore's and Jerrum's work on the Potts Model [16] as the following 

calculation makes the state space easier to handle. 

Considering 

M a has type iVa, = ^ ^ J Z( ,)-V»(— -#<-»-).) 

and using Stirling's approximation one obtains 

irp{a has type A^a) = Z ( /3 )-i A r-i e -^(E l ^ioga l ) + A(a) e -7V/3(a_ 1 +a 1 -^(a 1 -a_ 1 ) 2 ) 

with |A(a)| = 0(1) if there exists an e > with cij > e for all i G {—1,0,1}. So 
understanding 

/a (a) :=/?(- a_i - ai + AT(ai - a„i) 2 ) - ^a^loga^ (3.10) 

i 

will give us a better insight in how the BEG model behaves as a function of 0. First, 
we prove the following result in the appendix: 

Theorem 3.1. fp has at most three local maxima on := {(a_i,ao,cti) G M.+ : 
5^i=_i a i = !}■ There are no further maxima on the boundary of T^. 

Moreover, in [TT] (Sections 3 and 5) one finds a complete description of the set £p t K 
of the maxima of fp on T^, i.e. the set of canonical equilibrium macro-states of 
the model, for all and K. We will adopt the notation of [TT] for the critical 
values of the parameters and K: there exists a critical value C = log 4, such 
that £p,K has two different forms for < < C and for (3 > (3 C . More precisely, 

for < < P c , there exists a critical value Kc\fl) = + js, such that £^k 

(2) (2) 

is unimodal for < K < K c (/3), and bimodal for K > Kc \p)- Moreover, £p t x 

(2) 

exhibits a continuous bifurcation at Kc y/3), which corresponds to a second order 
phase transition. 

For (3 > C , there exists a critical value K < c l \(5) such that £p t K is unimodal for 
< K < KP(P), trimodal for K = K { c ] (0) and bimodal for K > K { c\(5). Moreover, 
£p t K exhibits a discontinuous bifurcation at K^p (/?), which corresponds to a first-order 

phase transition. The quantity Kc{0) is defined implicitly in [TTj, but an explicit 
form is not obtained. This is consistent with the general challenge in analyzing first- 
order, discontinuous phase transitions in statistical physics models. As a consequence, 
to study the behavior of K^ifi) as (3 — > +oo is not trivial. We prove in the appendix 
Othat this limit exists, and Ellis et al. [TT] indicate that numerical simulations lead 

to the conjecture that Ki ow := lim /3 _ s . +00 K < i'\f3) is equal to 1. 

A slight difficulty of the above discussion is also that the conventional picture of 
statistical mechanics where one studies a model depending on temperature is turned 
upside down: The critical parameters are defined as function of and not the other 
way round. 

In Section 5 of [TT] the authors extrapolate these results obtained by fixing and 
varying K to results about the phase transition behavior of the canonical equi- 
librium macro-states for fixed K and varying 0. We define the tricritical value 
K c = kP(0 c ) ~ 1.0820. Then for K > K C} there exists a value C 2 \K) such 
that EpK exhibits a second order phase transition at — 0i?\K): there exists a 



8 



MIRKO EBBERS, HOLGER KNOPFEL, MATTHIAS LOWE, AND FRANCK VERMET 



5 > 0, such that £/3 t K exhibits a single phase for (3 G ((3c \K) — 5, (3 C (K)] and two 
distinct phases for (f3 c 2 \K) , (3c 2 \k) + 5). And for Ki ow < K < K c (we precise in 
Corollary IC.3I why we need the condition K > K\ ow ), there exists a value 0\K) 
such that 8j3 } k exhibits a first-order phase transition at (3 = /3c (K): there exists a 
5 > 0, such that £$,k exhibits a single phase for (3 G (f3c(K) — 5, (3c (K)), three 
distinct phase at (3 = ^(K), and two distinct phases for (^(K), (3c X \k) + 5). 

These properties imply in particular that the Metropolis algorithm is torpidly mixing 
for the BEG model, for the values of (f3, K) such that the model is multimodal, if the 
base chain is a local random walk kernel. In fact, we know that ir^(a has type Na) 
has exponential structure. We also know that for suitable K, fp has at least two 
modes for sufficiently (depending on K) large (3. Take a to represent one of the 
maximum point. If we define B e (a) as the ball of radius e centered in a in the 
appropriate metric space, this leads to B £ (a) having exponential little conductance, 
therefore representing a bad cut in the state space. For more details see our Section 
[5] where this technique is used in the more complicated setup of swapping. 

In the present paper, we will consider the Simulated Tempering Algorithm and the 
Swapping Algorithm, which are defined in Section 2, for values of ((3, K) such that 
the Metropolis algorithm is torpidly mixing for the BEG model. We will focus on 
two regions of the parameters (/3, K) where we show the influence of the order of the 
phase transition on the speed of convergence of both algorithms. For the Simulated 
Tempering Algorithm and the Swapping Algorithm, the corresponding Metropolis- 
Hastings chain for the measure Tip, defined in (I3.2p . is given by (12. 2p . with the proposal 
chain ^ 

if x, y G { — 1,0, 1}^ and differ in exactly one spin x« ^ yi, for some i G {1, ...,N}, 
and K, gen (x,x) — |. In all other cases define 

The BEG Model, as Ellis et al. JTlJj show, exhibits different phase behavior depending 
on K. For small K < K\ ow there is, for every temperature, only one macro state, 
which implies that there is no phase transition. 

The first regime we want to look at is Ki ow < K < K c with K\ ow := lim^+oo K < i\f3) 
and K c = fC(log4) as in [HJ Eq. (3.19)]. The model exhibits a discontinuous phase 

transition at a (3^ {K) depending on K . We will use this discontinuity in the phase 
to show 

Theorem 3.2. Consider the BEG model with K\ ow < K < K c . Then for (3 > 
f3i 1 \K), the Simulated Tempering Algorithm is torpidly mixing, since 

G&p(QP st Q) < e' cN 

holds for c > as constructed in Theorem I5.il 

We prove this theorem in Section 

Corollary 3.3. This implies torpid mixing of the Swapping Algorithm in this regime. 

For K > K c the model shows a continuous phase transition at (3^ (K) which will lead 
to a Swapping chain which behaves like a Curie- Weiss model's Swapping chain which 
Madras and Zheng already considered in [22] • However, the technique used by Madras 
and Zheng relies on a static, non temperature-dependent, partitioning of the state 
space. The underlying Metropolis chain needs to mix rapidly in each part, for any 
temperature. In the rapid mixing case of the BEG model, this partitioning cannot 
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be achieved. Our proof relies on a dynamic, temperature dependent, partitioning 
in which one part gets very unlikely as the temperature is decreased. For the BEG 
model, the proof becomes much more involved, but we can use ideas of Madras and 
Zheng [22] and (a corrected version of ideas in) Bhatnagar and Randall (2] to get 

(2) 

Theorem 3.4. For K > K c and (3 > (3c \K), the Swapping chain with its transition 
kernel QPQ for the BEG model is rapidly mixing, since 

GMQPQ) > ^ 

for some polynomial p of N. 

We prove this theorem in Section HI 

Remark Giving an explicit bound would need a longer argument in the end of the 
proof of Theorem 14.51 which does not give a better insight of the situation. As we do 
not believe our technique to give a sharp bound anyway, we refrain from doing this 
extra step and do not give a suitable polynomial explicitly. 

Corollary 3.5. This implies rapid mixing of the Simulated Tempering chain QP s tQ 
in this regime. 

4. Proof of Theorem 13.41 

4.1. General partitioning of the state space in the case of K > K c . We will 

begin to show Theorem 13.41 by partitioning the state space 

fi = {-i,o,if = n + ufi_ (4.i) 

into two disjoint almost equally large parts 
o+ = {a g n| °i > °} u {(°> •••> °)} 



U 



| a 7^ (0, ...,0)| ^^crj = 0, with the first non-zero coordinate = +lj 



£}_ = {a e QlVtTj < 0} 



u 



I a 7^ (0, ...,0)| ^^cr-i = 0, with the first non-zero coordinate =-l|. 



Using this partitioning we will decompose Q sw = Q M+1 in the same way as Madras 
and Zheng in j22j Section 4, Step two]. 

Let Q sw := {+, — } M and take x E fi sw . Define the signature of x by 



sgn : tt sw -> fi sw , 42 ^ 
x i — y v 



with 

f _l_ if T-. , , (= O , 

(4.3) 



+ if x i+ i e Q + 
- if x i+1 e 

such that sgn(a;) contains the sign, of the total magnetization of each component of 
x except of the component for (3 = 0. The first component of x will have a special 
role, which will become apparent within the next paragraphs. 

We will decompose the state space using the number of +- signs in sgn(x). For fixed 
k E {0,...,M} define 

Qk '■= {v ^ fi sw |v has exactly k + -signs}. (4.4) 
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and note, that 

M 

is a disjoint union of 

Vl k := {x G fi sw |sgn(a;) G Q k }. (4.5) 

Define Q to be the aggregated transition matrix and (QPQ)\n k to be the restriction 
of the chain QPQ to the set as defined in Theorem IA.10I for this decomposition. 
Using Lemma IA.9I and Theorem IA. 101 we get 

Gap(QPQ) > Gap(Q5(QPQ)Ql) > Gap(Q) • min Gap((QPQ)|n fc ). (4.6) 

ke{0,...,M} 

Citing J22J Sec. 4, step three], we can do all displayed calculations in our setting as 
well, which eventually leads to 

G ap (QHQPQ)Q h ) > ^Gap(g)- min Gzp((Q k P k Q k )) (4.7) 

8 fce{o,...,M} 

with P k and Q k being the restrictions of P and Q to fl k , respectively (for a definition 
see Theorem I A. 101 in the appendix). 

The transition kernel Q is, in this setting, responsible for changing the number of 
components in x G f2 sw which are in Q + and f2_, respectively. Q is essentially a one 
dimensional nearest neighbor random walk on {0, ...,M} whose spectral gap is well 
understood. Due to the symmetry in the model it does not (noticeably) matter for 
the chain, whether we restrict a given component k of x to be in Q + or fi_. This 
leads to 

Gap((Q k P k Q k )) « Ga P ((Q k ,P k ,Q k ,)) Vk, k' G {0, .., M} (4.8) 

where ~ means that both spectral gaps are of the same (polynomial or exponential) 
order. This in turn implies min fce { ,...,Af} G&p((Q k P k Q k )) Gap((Q mPmQm))- We 
will write this as 

min GM(QkP k Qk)) « Gap ( (Q j\/ PmQm ) ) = Gap((QP + Q)), (4.9) 
ke{o,...,M} 

where by abuse of notation, Q M is denoted by Q and Pm by P + . Note also that all 
arguments of the proof work in exactly the same way for any k G {0, ...,M}. The 
only difference is, which part of the state space we look at, for a given temperature 
f3i. The quantities Gap(Q) and Gap(QP+Q) will be bounded below in the following 
subsections 14.21 and 14.31 

4.2. Speed of convergence of Q. Following in principle the proof given in [2"2"j 
Section 5] (also see [2HJ Section 2.5] for more details) we gain 

Lemma 4.1. The spectral gap of the aggregated chain Q satisfies 

Remark Remark that the for the number of spins N and the number of temperatures 
M considered are interchanged between this paper and the reference given above. 
On the other hand, the notation now agrees with the standard notation in statistical 
mechanics. 

Proof. We first verify that the probability for an accepted swapping move is bounded 
below by a constant. Using the notation given in |22j let us define 
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Then 



e P i+1 H{xi) e PiH(x i+1 ) 

p i)i+ i = mm II, e p. H{x . )e0i+lH{Xi+l) 



= mill (l^ e ^ ! irH(x,)+l3^H(x l+1 )-(3^ T H(x 1 )-l3i±^H(x, +1 )^ 

= mm ( 1, e p M p M ) 

> e 1 m 

N(K+1) 

> e~P^^~ (4.10) 
as H < {K + 1)N implies (OU]) to be true. 

Due to the definition of fl + and f2_ it is clear, that irp(Q + ) = |(1 + 1/Zp) for any 
(3 > 0. Recalling equations ( 13. 8 p and (I3.10P and Theorem 13 .11 it is possible to find for 
any > constants < c x < c 2 such that Zp, G [e ClN , e C2N ] for all 0' G [0, 0\. Using 

1< (i + e -^) M <e e - ciVA/ -^l (4.11) 

as N — > oo, we gain a constant a > 1 such that for all sufficiently large N and any 
^G{-,+} M 

?r(0 x x • • • x G 2 _M [a _1 , a] 
holds. Recalling the definition of flk m ( 14. 5 p we conclude 

7r(fi fe ) = 5>(fixn„x...x 0,J G W f ~) [a' 1 , a). (4.12) 



As we want to use Lemma IA.7I later on, in order to compare Q to an easier Markov 
chain, it is of interest to study the quantity 

7r(fi f )g(M + l). (4.13) 

Consider an x G fij and y G fl,-. In case |j — £| > 1 it is obviously impossible for the 
pure Swapping chain Q to accept a step from x to y, thus: 

Q(x, y) = 0, if x G y G fij with \i — j\ > 1. 

Hence, 

Q(z,j) = 0, if > 1. 
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The only way % can change is by interchanging the first two coordinates Xq and X\ of 
x. For < i < N, we obtain 

7r(fi;)Q(M + l) = E E *(z)Q(x,y) 

= E E E ^w,(o,iy) 



EE E ^^O^tPo.i^o^i) 



2M' 



2jj^ E E 7r o(^o)7n(^i)po,i(^o,2;i) E fl^iW) 
x en + xiGf7_ x'efti j=2 



1 M 

g ^mE E ^(^oVi^o ^ H^m: 

X q — <^ o ? j_ — i 

1 [M-l\ 1 



R N(K+1) ■ 
-P M I 



2M V i / 2 A/ +! 



e p a* a ,a 



with the natural definitions of the sets in the last two lines. 

We will now give another, much simpler, Markov chain whose spectral gap has been 
intensively studied. Consider the symmetric random walk S on {0, M}, i.e. 

S(0, 1) = 5(0, 0) = S(M, M - 1) = S(M, M) 

= S(i, i - 1) = S(i, i + 1) = i for < i < N. 

Let r(i) = (Y)2~ A/ be the binomial distribution on {0, ...,M}, and let R denote the 
Metropolis chain with proposal chain 5* and reversible distribution r(z). As has been 
shown by Diaconis and Saloff-Coste pp 698 and 719] R satisfies 

— < Gap(i?) < — . (4.14) 
In order to use Lemma IA.7I in the appendix first note that 

<U*) e ^ity[*~ 1 M=r®[a-\a] (4.15) 

implies r(i) > ^Ti(fli) for all < i < M. Second we conclude for < i < N, 

r(i + 1) 



r(i)R(i, i + 1) = r(i)S(i, i + 1) min < 1, 

( M ) ' 

r( / )- nun < 1, 

r{i)\ otherwise 

1 (M\ . M-i - f ■ > M-l 
2*'+! W J i+1 11 4 — 2 

(f ) otherwise 
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Fixing A := 4aMe /3 m + ' it is now straightforward to check that 

r{i)R(i, i + 1) < Aic(£li)Q(i, i + 1) (4.16) 
holds, for any i. Now Lemma [A. 71 in the appendix yields the desired inequality 

J_ e -^.«.i_ £ « Gap(JJ)£Gap( g, (4 , 7 ) 

□ 

4.3. Speed of convergence of QP + Q. Ellis et al. [TT] show a continuous phase 
transition in the state space for these values of K. All but exponential little mass is 
located around 

<W(0) := ( T ^ ? , rT ^ ? , r f^ p? ) <= T„ (4.18) 

for /3 < (3 ( c '(K) and for /3 > (3 ( C '(K) all but exponential little mass is located around 
the points 

amax(_1) := \C((3,K)' C((3,KY C(/3,K) ) eTo ° (4,19) 

/ e ~2f3Kz a -/3 y e 2l3Kz a -/3\ 

flmax(1) := V C(/3,K) ' C(/3,Ar)' C(/3,^) J G T °° (4 ' 20) 

with C((3,K) = 1 + e" 2 / 3 ^ 2 "-/ 3 _|_ e 2/3X2 a -^ Dem g the normalization constant and 
z a ({3,K) > as constructed but not computed in [TT], also see the appendix for an 
insight in the technical problems one faces. The standard Metropolis chain would get 
stuck in either of the regions around a max (l) or a max (— 1) as it is exponentially unlikely 
for the chain to leave either of these local states. The swapping chain circumvents this 

(2) 

bottleneck by swapping a component located close to a ma x(— 1) up to /3 < f3c (K) at 
which temperature the Metropolis chain is rapidly mixing on the whole state space. It 
will find a state close to a max (0) and, if suggested to increase (3, it will choose either of 
the two paths leading to a max (— 1) or a max (l) with equal probability. The bottleneck 
encountered in the intermediate regime K\ ow < K < K c , which is described and used 
in Section El will not pose a problem, as 



{ (0) ii/3<0(K) 
t (l) X(3>0(K) 



/3^r maX ;,' " a (2) )Z( (4-21) 



is continuous in the present case K > K c . 

To formalize this, a technique introduced by Bhatnagar and Randall [2] Sec. 4.1] (in 
a modified form) will prove to be a powerful tool for showing rapid mixing of QP + Q. 
We need to recall the notation of A s ^ r introduced in ( 13. 5p . Assume (3 is big enough, 
such that the function fp introduced in (I3.10p on the field A = {A s ^) s ^ r has two local 
maxima, such that it has two local modes. Inspired by (13. 9 p we define a probability 
measure Pf g on B := {(a_i, cti) G [0, l] 2 |a_ 1 + a± < 1 and a_ x < ai} by 

-^(a-i, aO := _L^^(-i.i— i-^O (4.22) 
CLA A ffj [1\ ) 

where A denotes the Lebesgue- Measure restricted to the subset B. ZfJN) denotes 
the normalization constant. Let a g ((3 ic ) denote the unique local maximum point of 
fp i on B at the next to critical temperature 

z c :=max^|/3 4 </3( 2 )(ir)}. 

Further define the set 

V:= {a max (l)|/3>/3f (K)} 
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(2) 

which defines a continuous path from a m ax(0)(/3e (K)) to (0,0,1) in B. Take V to 
be an ordered set with the previously implied ordering. The path V separates B into 
two disjoint parts B g U B\ = B with V C B g . Obviously 

P p?\K)^ B 9 ) = 1 - P#> (K)tIf {Bi) -> c G (0, 1) 

for some if-specific constant c as iV — > oo. Remembering the models phase behavior 
we will define fig and £>z by (|, |) G £> 9 while (0,0) G £>; as this notation reflects 
where the global and local maxima appear. With the definition of 

A g {P ic ) := (B g n T) and Ai{j5 ic ) := (B l fl T) 

we know by continuity of 7Tp in (3 that ir^ (A g ({3i c )) — > c and consequentially 
TTft (y4/(A c )) — ^ 1 — c. For any i G {i c + 1, M} there exist two local maxima, the 
global one denoted by a g ((3i) and the local (non-global) one denoted by ai((3i). We 
define A g ((3i) and Ai((3i) by 

there is no nondecreasing path from a to a\ =>- a G *4. 9 (A) (4.23) 
there is no nondecreasing path from a to a g =>• a G *4j(A) (4.24) 

there exist nondecreasing paths _ (a G A g (/3i) if a G *A 9 (/3j_i) 
from a to a g and from a to a ; | a e Aiifii) if a G A(A-i) 

(4.25) 

Note that for each i the sets A g (/3i) and *4/(A) form a partition of i3, since otherwise 
fp would need to have more than two maxima on B, in contradiction to Theorem 13. II 
It will prove convenient to have 

Lemma 4.2. (ni(A g ((3i)) i£ { ic ^,^M} is monotonically increasing, while (fti(Ai((3i)) ie {i Ct ... t M} 
is monotonically decreasing. 

Proof. This proof consists of multiple parts. We will first establish that for j3 > 



Omax(O)) is monotonically decreasing, while (4.26) 
/ / 3(a max (l)) is monotonically increasing. (4.27) 

This is a straightforward calculation. Inserting a max (0)(/3) into fp yields 

rffc(a ma *(0)) 2e^ 
dp ~ l + 2e-? 

thus f )4.26p . Defining the canonical free energy of a thermodynamical system by 

<ptf):= lim hog (Zp(N)) (4.28) 

it follows from (13. 9p that in the interesting phase of (3 > f3c 2 \K) 

<p(P) = //3(<W(1)), (4.29) 
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as 



<p(p) = lim hog (Zp(N)) 

= lim -log( V e N M^) 

aSTiv 

< lim —log (N 2 e Nf e {a ™* {1)) ) 
= lim (llog(N) + fp(a max (l))) 

= //3(Omax(l)) 



¥>(/?)= lim llog(^(iV)) 

iV— »oo iv 

= lim -log( V e^ a >) 

a€Y N 

> lim — log (e Nf ^ a ™* {1)) ) 

= /^(amax(l))- 

Differentiating for a fixed state x = (x_i,xo,x + i) in the domain of fp gives us 

x ) =x -l + ^(xi-x_i) 2 (4.30) 



d/3 

which implies 

^ (0A1 W - 1 > 0. 

This guarantees /^(a ma x(l)) to be strictly increasing for sufficiently large /3. Together 
with the general fact (see for instance [13J or, for a non-rigorous overview, [12]) that 
(p(/3) is concave for [3 > [3c (K) we gain (I4.27p . 

In the second step we will confirm, that there is no point-movement from A 9 to Ai 
by going from fa to fa + % for all i c < % < M — 1. For this, first note, that any point 
x, which has a nondecreasing path to any point y G V also has a nondecreasing path 
to a g . Assume, this to be wrong: 

First note, that fo is monotonically decreasing on V. Assume it would not be, then 
there are two points, zi,z 2 G V with fo(zi) = fo(z 2 ). As a max (l) is continuously 

(2) (2) 

moving from a max (0)(/3c (K)) to (0, 0, 1) there needs to be a fa > (3 C \K) such that 
fp'(zi) > fp>(z 2 ). Of course, there also needs to be a (3" > (3' such that fp"(zi) < 
fp"{z2)- This contradicts f)4.30p . 

Coming back to the original contradiction argument: By assumption, there exists a 

(2) 

(3 > f3c (K) such that fp, if restricted to V, has at least two modes - where, without 

(2) 

loss of generality, the highest one is in the one containing a max (0)(/3c (K)). Take 

z G V to be a local minimum. The points z just further away from a max (0)(/3c (K)) 
than z must thus satisfy 

d P { ' d(3 [ 1 

as fo is monotonically decreasing on V and the derivative of fp with respect to (3 
does not depend on (3. This warrants for fp(z) < fp'(z') for all f3' > (3 (again for 
the same reason), which in turn implies either a max (l) stays left of z for all (3 or that 
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Omax(l) exhibits a discontinuous behavior close to z. Both contradict a combination 
of Theorem 13 . 1 1 and the continuity of a max (l). 

This directly implies, that every point x G A g (/3 ic ) stays in A g for all i, as any 
(nondecreasing) path leading from x to a/(/3j) will need to cross the set V. A point 
x G A g (f3i) which does not lie in A g ((3i c ) must have been forced to switch from Ai to 
A g at some index i c < j < i. This means x is being separated from ai by some path. 
Due to an argument close to the one given before, this path will block the way from 
x to ai for any i > j, such that again, x G A g (f3i + \). 

Now, for any f3 > /3c 2 \k) it follows from a similar calculations as for equation (14.29p . 
that 

lim Ilog(vr ft (A,)) =0 (4.31) 

TV— >oo iv 

lim 1 log {-KfriAi)) = / ft (a max (0)) - / ft (a max (l)) (4.32) 

which together with the first and second argument yields the claim. □ 
For later use we need the following partitioning of the state space. 
Definition 4.3 (Definition 4.1 of |2J). For x G f2 + M define the trace 

Tr(x) = t G {0, 1} M 

with ti = <^=^ Xi G Ai and ti = 1 <^=^> Xj G to indicate which part of the state 
space which component is in. 

The 2 M ~ lc+l possible values of Tr(x) characterize the partitioning 

n + M = |J n +t (4.33) 

t£{0,l} M 

(with the canonical definition of fi+J we will use. First using Lemma IA.8I in the 
appendix for (14.341) . Lemma [A. 91 for (14.351) and afterwards Theorem I A . 1 1 we obtain 

Gap(QP+Q) > \ G&p(QP+QQP + QQP + Q) (4.34) 

> 1 G^iQP+Q^QP+QiQP+Q) 1 *) (4-35) 

> ^Gap(g)-min{ Gap ((QP + Q) | Tr -i w )} (4.36) 



where Q is an abbreviation for the aggregated chain QP + Q. We can argue as in (14. 7p 
to get 

Gap(QP + Q) > i Gap(Q) • min { Gap ((QP + Q)\ Tr -i (t) ) } 

> 24 G MQ) ■ min { Gap(g| Tr -i w P+| Tr -i (t) Q| Tr -i (t) )} 

> ^ Gap(g) • min { Gap(P + | Tr - 1(t) )} (4.37) 

where the last inequality uses Lemma IA.9I again. This looks promising, as the set 
Tr _1 (t) is unimodal in each component as constructed, and thus the chain P + | Tr -i^ 

should be fast on this subset. Q will be comparable to a very simple random walk, 
which is known to be rapidly mixing, thus leading to a polynomial lower bound for 
Gap(QP + Q). 
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4.3.1. Speed of convergence of the aggregated chain Q. In the wake of Bhatnagar and 
Randall [21 Theorem 4.4] we define the probability measure 

M 

m--=U^{ TT ^^) ( 4 - 38 ) 

on the state space 

ie-l M 

^=\[{l}x\[{Q,l}. (4.39) 

i=l i=i c 

A simple reversible random walk RW1 with respect to 7? to compare Q on to 
would be the following. Start at some £ G fi and either switch the component ti c 
from to 1 or vice versa with the Metropolis probabilities induced by 7?, or choose 
an i G {i c , M — 1} at random and interchange components i and i + 1 according 
to a Metropolis update with regard to 7? as well, such that t — > (i, % + l)t. Again, for 

technical reasons RW1 does not act on t at all with probability |. In order to analyze 

RW1 we will compare it with an even simpler random walk RW2 on which picks 
an i 6 {i c , M} at random and updates £, by choosing t\ exactly according to the 
stationary distribution 7?,. It is apparent, that after this move, the zth component 
of t is in equilibrium. Using the coupon collector's theorem (see for instance (2.7), 
(5.10) and (12.12) in [IB]), we get easily 

Lemma 4.4. Let R denote the transition kernel of RW2 . Then 

Gap( J R) > — — — . 
FV ; ~ 4MlogM 

This leads directly to 

Theorem 4.5. The aggregated chain Q of the Swapping Markov chain is rapidly 
mixing on Q for K > K c . 

Remark Again we refrain from giving an explicit bound (also recall the remark after 
Theorem I3.4j) . 

Proof. The main idea is, to give a canonical path in RW1 in which every step com- 
pares well to the rapidly mixing chain R. Consider a single transition (t, t') in R, 
thus t' = (ti, ti-i, 1 — ti, ti + i, tu) for one i > i c . Now consider the concatenation 
Pi P2 P3 of the three paths 

• pi consists of the i — i c swap moves from t to 

t ^ = (t\, ti c —i, ti, ti c , ti—i, tj+i, £m) 

• P2 is the one step from t^ 1 ^ to 

t^ = (t\, ti c -i, 1 — ti, ti c , tjvf) 

• p 3 consists of the i — i c steps needed to swap the ith component back up, thus 
Pi is the path from t^ to 

t^ = (ti, t ic , £j_i, 1 — ti, tjvf)- 

In order to be able to use Lemma IA.6I in the appendix we will establish that 

7r{z)RWl{z, z') > - rc{t)R{t, t') (4.40) 

holds for any transition (z, z') in the canonical path p\ o p 2 o p 3 . 
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Transition along pi\ Let z = (t , t ic , t,_i, t iy tj, £m) for a j E {i c + 1, M} 
and z' = (j — l,j)z. It is easy to verify 

ffW Wl(^-)- 2(M ^ +1) .ni.»(l.|g) 

= 2(M _^ + (4 ' 41) 

and for t, t' = (ti, £j_i, 1 — £j, £m) for one 2 > z c , 

S (M -I + 1) ^ < 4 ' 42 » 

with 

i* = (ti, 0, tjw). 

Thus it suffices to show 7r(t*) < 7r(;z) and 7r(t*) < 9(z'). We will show this for z only, 
as the argument works exactly the same for both z and z' . It is useful to partition 
t* into blocks of bits t\ that equal 1, separated by one or more zeros. Let i c < k < i 
be the largest value that satisfies tk = 0. Using Lemma H~2| it is straightforward to 
verify 

i i 

n WW > n w)- 

Z=fc+1 Z=fc+1 

Similarly , consider the next block of Is in £*, until the first index k' such that t' k = 0, 

k k 

n frjfa) > n 

Z=fc'+1 i=A'+l 

Continuing in this way we find 

i i 
1=3 1=3 

and thus 

tt(z) > 9(f). 
In an analogous fashion one can also show 

tt(z') > 9{t*) 

such that f)4.40p holds on all transitions in p%. 

Transition along p2~. The same argument as before yields 

mm(7f(z),7f(z')) > n(t*) 

for (z, z') E p 2 - 

Transition along p 3 : This is exactly as the case of p\. 

We find, that for any edge (z, z') in the canonical path equation f)4.40p is satisfied, so 

what needs to be done in order to show rapid convergence of RW1 to equilibrium is 
to ensure that not too many paths use the same transition (z, z'). With the notation 
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of Lemma IA.6I below, we can obviously bound the number of paths in E(z, z') by M 
and as any path 7^/ has at most 2M + 1 many transitions, we can guarantee 

A = max < ^=z V \j tt ,\9(t)R(t,t') \ < AM 2 + 2M (4.43) 

which leads to Gap(Wl) > (2(2M 3 + M 2 ) log(M))" 1 . 

It remains to compare RW1 with Q. We will do so by means of case differentiation. 
First consider the case of z' = + l)z with zi = 1, Zi+i = in which we will show 



Q(z,z')>^e-P^RWi(z,z') 



(4.44) 



where the term e 13 ( « + ' is of order 0(1) as M = ciN. So taking z' = + l)z 
with Zi = 1, Zi + i = leads to 



RWl(z,z') 



1 



min 1 



TT(Z 



2(M - i c + 1) V ' ^0) / 2 ( M - »c + 1) 



(4.45) 



as 7Ti(l) < 7Tj + i(l) and 7?i(0) > 7Tj+i(0). The equivalent for Q yields with B := {a; G 
fi+^i G -B £ (a s ) PI A,, G -B e (a z ) H A} 



> 



> 



> 



rr E E *{x)(QP+Q)(x,v) 



^2 n(x)Q(x, 1): 



1 



4n(z 
1 



47r(2; 
1 



47?(z 

1 



Aixiz 



2J 7t(x)Q(j;, (i, i + l)x) + 2J vr(x)Q(x, (z, i + 1) 
xeB xen+ z \B 

ir(x)Q(x, (i, i + l)x) 

xeB 



X 



2(M + 1) 



e p m 



7T(B) 



1 



}(m+i; 



iV(K+l) 



(4.46) 
(4.47) 



Equation (14.461) is obtained analogously to (I4.10p . For (I4.47P we use Theorem 13. 1[ 
which implies that 



Tn{B £ (a g ) n Ag) ir i+1 (B e (ai) Pi A) > 

7Ti(A g ) 7l- i+1 (Al) 



-cN 



for some c > 0. Second consider z' = l)z with Zi = 0, z i+ i = 1 which leads to 
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and with B' := {x G fi+^jsj G B e {a{) nAi,Xi + i G B £ (a g ) H^4 9 } 

ItE E n(x){QP + Q)(x,y) 



7t(z) 



1 



> 



1 



> 



4tt(z 
1 



> 



An(z 
1 



An(z 
1 



4tt(z 



Yl Yl *{x)Q{x,y) 



^ ir(x)Q(x, (i, i + l)x) 
} j n(x)Q(x, (i, i + l)x) 



2(M + 1) 



(4.49) 



2(M + 1) 



> 



7T U 



8(M + 1) 5r(«) 



, jV(X+l) „ , 



p M (1 



-cN\ 



(4.50) 



The arguments for (14.491) and (14.501) are the same as above. The two remaining cases 
ofV = ( Zo, 1 — Zi c , Zm) with Zi c G {0, 1} are dealt with automatically by showing 
rapid mixing of Pi c on A g = A. The claim follows by using Lemma I A. 71 □ 

4.3.2. Rapid Mixing in A g and A\. It remains to show rapid convergence to equilib- 
rium of P + | Tr -i(^ as constructed in (I4.37p . Using Theorem I A . 1 1 1 we can stick to the 
case of 

for fixed t and i. Using Lemma [A. 81 with m = 3 gives us 

Gap(T) > i Gap(T 3 ) 

which will prove to be simpler to handle than T itself. We will only deal with the 
case of A g as the case of Ai works the same. Consider the disjoint union 



U A * 



(4.51) 



and decompose the state space accordingly. This leads to 

Gap(T 3 ) = Gap(T^T 2 T^) > Gap(T) ■ minGap(T 2 r ) 



(4.52) 



which may now make apparent, why dealing with T 3 is an advantage over dealing 
with T. Here T is the aggregated chain defined as Q in Theorem IA.10I Restricting 
T 2 to A Sjr will still give us a nontrivial chain, whilst the restriction of T to A s , r would 
deterministically stay in the originally occupied state. 

Theorem 4.6. Gap(T) > \N~ 5 

Proof. This is already well prepared. As constructed earlier, f@ fulfills an unimodality 
condition on A g . Thus we can easily choose one path j xy for any given set x and 
y that is unimodal. Each such path has at most length A^ 2 , such that the Poincare 
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inequality given in Lemma IA.5I simplifies to 

A = /A m T s ^TTa \t(a — 2 — V E \iz^\n{zi)n{z-2) 

{A s ,r,A 3 i, r >) 7ri{A s ,r)-l [Ag.r, A s >y) ^/a a \ 

7zj z 2 3 \<A-s,r i^ s t v i ) 

< N 2 max = r K i {zi)Tx i {z2) 

(A s , r ,A s >y) 7ri(A S}r )T( A s , r , A s ' r ') ^/a A \ 



AT 2 ^i{zx)^i{z 2 ) 

JS max > 

/ J _ A . . \ — <■ 



(As,r,A s >y) fr* 7Ti(A Sir )T(A S) r,A s 



(4.53) 



It is now of interest, how T behaves. Given A s , r ^ A s iy with T(A s>r , A s /y) > 0, we 
first consider the case Hi(cr) < iii(cr') for cr G A s , r an d o' G A s >y. Note that 7Tj(o") is 
independent of the choice of o G A s , r - 

T(A s , r ,A s > >r >) = 1 ^2 E 7r i(°") T (°"> (T/ ) 

= E E 

1 

~ AN 

The second case ^(c) > iTi(cr') uses The reversibility of T together with 

T(A,r, A 8 >y) = —T-T-T E E ^0)^0,0 

= ^y E E *(«W,„) 

= 4^wxi E E ^(O 



> 1 7Ti(A 



4iV 7Ti(A 



To further analyze (I4.53P we will take the worst case scenario —T^y < 1 and for 
inequality (I4.54p recall that all paths are unimodal: 

7^(21)^(^2) 



A < N 2 max 

t A A , A 



(A s ,r,A s >y) ia~^ a \' K i{A sr )T(A s , r . l A s iy) 

<4iV 3 max V *A^A 

lz 1 z 2 B{A 3 ,r,A 3 iy) 

= 4iV 3 max V ^\ ^ M A sr ) 

7z 1 z 2 B{As,r,A s ,y) 

< AN 5 . (4.54) 

□ 
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Theorem 4.7. Gap(7j.) > g^re^-^. 

Proof. We need to consider two cases. The first is An,n in which case |^4.jv,iv| = 1, 
such that N is the constant chain, and therefore rapidly mixing. The other case 
is A s , r with s < min{r,N — 1}. Let a, a' G A s>r with a ^ a'. We will compare 
T s 2 r with the Markov chain (Aj)j given in Appendix [B] Assume (j, k)o~ = a' for some 
j, k G {1, JV}. Otherwise T 2 r (cr, a') = P(X i+1 = o-'jXi = a) = 0. We know 

P(X l+1 = = a) = i 

and 

Tj>,aO>T(a,r)T(r,a') 

for a fixed r. It is obvious that either T(a,r) = ^ or T(r, a') = Due to the 
symmetry assume 

r := erj_i, cr fe , o-j+i, CTfc, ...(Tat) 

and conclude 

X e/3 (7V-i?(r))-^5 2 (r) 

T(a,r) = — minjl,— ^3^) 

= ^min{l,e^-^ + ^( s2 - s2 «)) 
AN I i 

= — minjl e «-««)+^(-««)( s + s «)\ 
~ AN 

such that taking r = (01, (Tj-i, o~k, Cj+i, •■•Cat), where, without loss of generality, 
(jfc > crj yields 

And we can easily deduce from Lemma IB. 1 1 that Gap(AT) > ^4 (see |18j for instance). 
Then Lemma [A. 71 proves the claim. □ 



5. Proof of Theorem 13.21 

In this section we will prove Theorem I3.2| which concerns the case K\ ow < K < K c . 
This is done in three parts. We first give the general idea, why slow mixing should be 
expected. We then support this idea with the necessary calculations in the remaining 
parts. 

5.1. The idea. We will follow Gore and Jerrum [TH] in order to find a bad cut in 

the state space of BEG for > 0c (K). Using their technique we can show, that the 
Metropolis chain has to overcome an exponential barrier to leave any local maximum. 
We will show, that an e-stripe around the 0-axis contains such a maximum, with e 
independent of fa. Intuitively speaking this leads to the following behavior of the 
Tempering chain. At fa close to the chain will find the unique global maximum on 
the 0-axis. As of now the tempering chain is trapped in this e-stripe, as Ellis et al. [TT] 

show a discontinuous behavior of the global maximum as fa passes through 0\K). 
Thus the chain will never get the chance to leave this e stripe within polynomial time 
at any temperature, even though, at low temperature, this stripe has exponentially 
little mass. 
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5.2. One bad cut for BEG's Metropolis chain. Following the idea stated earlier, 
we show the existence of a bad cut within close proximity to the 0-axis in the two- 
phase region. It is well known, due to Ellis et al. [TT], that 

/ e~ p 1 e~ p \ 

a max (0) := [ 1 + 2e .^ 1 + 2e _^ i + 2e-f> ) G T °° (5 ' 1} 

is the unique global maximum for f3 < fic \K) and a local, non-global, maximum for 
(3 > p { c\K). Here 

Too := {(a- u ao, a x ) e R% : ^ <k = 1} (5-2) 

i 

is the set of all probability measures on three points. They further show, that the 
phase transition for fixed K at pc(K) is discontinuous, thereby granting us, uni- 
formly in /3, the existence of an e > such that 

N := {a\\S N (a)\ < N ■ e} (5.3) 

contains only this local maximum, and fp restricted to -B £ (a max (0)) is unimodal for 
all (3 > 0. It is even possible to show fp restricted to N to be unimodal for all (3, see 
Lemma [D. II for details. 
Recalling Section [3] we have 

1 Jv(/3(K(a_i-a0 2 -ai-a-i)-Ei=-iail°g»» J +A(a) 

7r^(«7 has type N • a) = — e V / 

- _L„A7/3(a)+A(a) /k ^ 

which implies, that every local maximum of fp yields a locally exponential structure 

in 7T. This leads to exponentially low conductance $jv" f° r & U /3 > 0c (K), thereby 
implying slow mixing of the Metropolis Algorithm in this regime. 

5.3. The bad cut for BEG's Simulated Tempering chain. Having low conduc- 
tance $ n for any f3 > /3c (K) using the Metropolis Algorithm it is easy to generalize 
this to the Simulated Tempering chain. To this end define 

A^dge := {<r\Ne - 1 < \S N (a)\ < N ■ e} (5.5) 

and get 

Theorem 5.1. Let M andM e dge be defined as in (15.31) and (15. 5p . ForK\ ow < K < K c 
and any f3 > 0, there exists an e > such that for sufficiently large N , 

Kp(Nedge) . - cN 



,,,, <e- cN (5.6) 

holds, with c > only depending on K . 
Proof. Recall equation (I5.4p 

7Tp(a has type N • a) = -— e Nf ^ +A ^ 

and verify that there are only polynomially (in N) many a e T which satisfy iV ■ a G 
A/"cd g e- Then, considering 

i 

fp(a) = fi(K(a-! - ai) 2 -a x - a_ x ) - ^ a i^ a i 

i=-l 

and the results presented by Ellis et al. [H] it is clear, that fp has a local maximum 
at a max (0) (see equation (15.11) ). Due to fp being smooth in a max it is clearly possible 
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to find an e > such that fp is unimodal on _B £ (a max ). Due to the discontinuous 

behavior of the system at f3$p(K) for K G (K\ ow ,K c ) and as fp(a) is smooth in all 
variables, including (3, this e can be chosen uniform in (3. 

Combining this with the exponential structure of (15. 4p leads to the desired result 

with c depending only on K and sufficiently large N. □ 
This is the main ingredient for this section's main 

Theorem 5.2. Define Af and Hedge as ^ n Theorem 15.11 For K[ ow < K < K c and 
(3 > f$p{K), let fa = for i = 0,...,M. Then for the Simulated Tempering 
Markov chain, the set 

S := {{x,i)\ x eAf,i = 0, ...,M} 
satisfies $5 < e~ cN with c > 0. 

Remark For the definition of the conductance $5 of a set S, see Theorem IA.121 
Proof. Using Theorem 15.11 we get 



< 



< 



Eft EsgMxigc Es'eV 

EftE^W^) 

Eft ExgA^edgc 

Eft ExeW^) 

Eft^C^edgc) 
Eft 7Ti(A/'cdg e ) 7ri( ^ c) 

Eft ^(■A/'edge) 

e CjV Eft^(A4dge) 
-ciV 



□ 



This concludes the proof of Theorem 13.21 by using a variant of Theorem IA.121 in the 

appendix: Indeed, we do not have 7r(<S) < 1/2 for all > p£ (K), but as an easy 
extension of Theorem IA.121 one obtains 



if we define 



for some q G (0, 1). 



1 - q 



$ = min $5, 

S:it{S)<q 



As we chose (3 > /3c (K), and set fa = jjfa for i — 0, . . . , M, there exists a p G (0, 1) 

such that fa < P?\K), for i < pM and fa > fiP(K) for i > pM. For ft > fiP(K), 
we have 7Tp(J\f) < 1/2, since a max (0) is a local maximum, which implies 

1 M 1 _ 

tt(5) - — — V^C/VO < g:=p+^-^ < 1. 
v; M+l^ v; ~ y y 2 

i=0 
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Appendix A. General preparations 

In this section we give some fundamental definitions and some well known lemmas 
on Markov chains from other articles. We state them in this section for the reader's 
convenience. 

Definition A.l. Let A be a sigma-field on a set Q. The total variation distance 
between two probability measures ir and r on (Q, A) is defined by 

d(n,T) TV := sup{|7r(A) - t(A)\\A G A}. 

The fundamental result for all that follows is 

Theorem A. 2 (Ergodic Theorem for Markov chains). Let (X ,Xi, X 2 , ...) be an 

irreducible aperiodic Markov chain with state space S = {s±, Sk}, transition matrix 
P and arbitrary initial distribution Then there exists a unique distribution n 

which is stationary for the transition matrix P. If denotes the distribution of X n 
then 

(n) TV 
/I {n> > TV. 

In general, the definition of stationarity proves complicated to construct or to verify 
for a given transition matrix P or for a given probability distribution 7r. There is the 
tighter concept of reversibility which, in most cases, is much easier to construct. 

Definition A. 3. Let (X ,Xi, ...) be a Markov chain with state space S = {s±, s^} 
and transition matrix P. A probability distribution n on S is said to be reversible for 
the chain if for all x,y G S we have 

n(x)P(x,y) = n(y)P(y,x). 

The Markov chain is said to be reversible if there exists a reversible distribution for 
it. 

The key question for all kind of MCMC algorithms is how fast they mix, i.e. how 
rapidly they converge to the desired invariant measure. So in general, let (X n ) n >o 
be a homogeneous, irreducible and aperiodic Markov chain on a finite state space 
Q, reversible with respect to a probability measure n (on fl, that necessarily charges 
every point). The speed of convergence is determined in terms of 

t{s) = min{n : d TV (/i (n) , 7r) < e}. 

Here, of course, /z*™** is the distribution at time n of the Markov chain corresponding to 
the algorithm and d'rv(/- t<n \ ^ is the total variation distance between this distribution 
at time n and the invariant measure 7r of the chain. Rapid convergence of such a 
MCMC algorithm means that one can bound t(e) by a polynomial in e~ x and the 
problem size. The algorithm is said to be torpidly mixing if it is not rapidly mixing. 
There is an intrinsic relationship between t(e) and the spectral gap of the chain 
defined by 

Gap((X„)) := Gap(P) := 1 - max{|A;|, A* ^ 1} =: 1 - |Ai|, 

where we write A« for the eigenvalues of the transition matrix P = (P(x,y)) Xjy of 
the chain (X n ) and have Ai denote the second largest eigenvalue. For this define the 
Dirichlet form of P by 

S(f,f) :=lYl \f(x)-f(y)\ 2 P(*,yMx) (A.i) 

for any function / : Q — > R. If we further define 

V(/) := E.(/ 2 ) - (Ejf = \ \f(x) - f(y)\Mx)n(y) (A.2) 
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it follows that 



Gap(P) = inf { R^f* < oc, V(/) + 



As a matter of fact, for an irreducible and aperiodic chain the following estimates 
holds true (see e.g. [25]): Let tt_ := min :r 7r(x) (which is non-zero by the ergodic 
theorem for Markov chains), then 

rie) < r W lQ g(— ) 
Gap(P) ne 

as well as 

r(£) ^2G^ 1 ° g( 27 ) - 
We can thus control the speed of convergence of the Markov chain (or the MCMC 
algorithm, respectively), if we control the size of the spectral gap of P. 

Lemma A. 4 (Lemma 3 of [22]). Let P be a Markov chain that is reversible with 
respect to a probability measure tt on the finite state space S. Also assume that 
P(x,x) > \ for every x G S. Then P is a positive operator. 

Lemma A. 5 (Poincare inequality, Proposition 1' of [5]). Let P be an irreducible and 
reversible Markov chain on a finite state space S. We associate to P the graph with 
vertex set S and edges (x,y) if and only if P(x,y) > 0. For each pair of distinct 
points x,y G S, we choose a path ^ X y from x to y, such that a given edge appears at 
most once in a given path. Then the second largest eigenvalue \% of P satisfies 

X 1 = 1 - Gap(P) < 1 - - 

where 

A:= T^ n(x)P^y) ^ l^»W*M*) 

and \ r y Z iz 2 \ denotes the number of edges in the path r y zlZ2 . 

Lemma A. 6 (Comparison of Dirichlet forms, Theorem 2.1 of |4J). Let P,ir and 
P, tt be reversible Markov chains on a finite state space S, with respective Dirichlet 
forms £ and S. For each pair x ^ y, with P(x,y) > 0, we fix a path ^ xy = (xo = 
x, xi, X2, ■ ■ ■ , Xk = y), such that P(xi, £j + i) > ; of length \ j X y\ = k. Set E = {(x, y) : 
P(x,y) > 0},E = {(x,y) : P(x,y) > 0} and E(e) = {(x,y) G E : e G j X y}, where 
e G E. Then 

£ < AS, 

where 

A := max . . — > h xv \7r(x)P(x,y). 

(Z, W )£E7T(Z)P{Z,W) ^ 1/2/1 V ^ K 
E(z,w) 

Lemma A. 7 (Lemma 5 of [22] ) . Let (P, tt) and (P,7r) be two Markov chains on the 
same finite state space S, with respective Dirichlet forms £ and £' . Assume that there 
exists constants A, a > such that 

£' < A£ and an < tt. 

Then 

Gap(P) < -Gap(P). 
a 

Remark A sufficient condition for £' < A£ is that 

7r(a;)P(a;, y) < A tt(x)P(x, y) for all x,y G S such that x ^ y. 
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Lemma A. 8 (Lemma 6 of [22]) • For an V reversible finite Markov chain P, 

Gap(P) > — Gap(P m ) Vm G N*. 
m 

Lemma A. 9 (Lemma 7 of [22] )■ Let A and B be Markov kernels reversible with 
respect to a distribution n. The following holds for A and B: 

Gap(AEM) > Gap(P). 

This also holds for A substituted by A's positive square root A^ , if additionally A is 
a nonnegative (self- adjoint) operator. 

Theorem A. 10 (Caracciolo-Pelissetto-Sokal [22]). Let fi be a probability distribution 
on a finite state space S, and let V be a transition matrix reversible with respect to 
fi. Suppose that we partition the set S as 



S = Si, with Si PI Sj = 0, z/ i ^ j. 



i=l 



For each i = 1, . . . ,m, let Vi be the restriction of V to Si, by rejecting jumps that 
leave Si : for all x G Si, for all B C Si, 

Vi{x, B) = V(x, B) + l {xeB} V(x, S\Si). 

Let Q be a positive operator, that is also reversible with respect to n, and Q the 
aggregated chain associated to the partition (5j)i=i,... im / more precisely, for i,j = 
l,...,m, 

Let OS be the positive square root of Q. Then 

Gap(Q^PQ^) > Gap(Q) ■ min Gap(^). (A.3) 

l<i<m 

Theorem A. 11 (Diaconis and Saloff-Coste [3]). Fori = 1, M, let Pi be a reversible 
Markov chain on a finite state space Consider the product Markov chain P on 
the product space Q x ... x Q M , defined by 

1 - 

P = — — I®...® I® Pi® I®...® I, (A.4) 

i=0 

where (in a slight abuse of notation) I denotes the identity on the space it is defined. 
Then Gap(P) = min ie{0 ,...,M}{Gap(P)}. 

Theorem A. 12 (Jerrum and Sinclair [26J). Let P be a Markov chain on a finite set 
Q reversible with respect to ir. For all S C Q, let 

<S) 

and the conductance $ given by 

$ = min $5. 

S:tt(S)<1/2 

Then we have 

$2 

— < Gap(P) < 2$. 
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Appendix B. Random 3-coloring of the complete graph 

In this section, we will give a rapidly mixing Markov chain (Xj), which has the 
uniform distribution on the set of of all 3-Colorings with a given number of vertices 
of a certain color as its stationary distribution. This will be of use, as we intend to 
compare the Metropolis Algorithm on A SjT (see (13. 5p ) of the BEG model with this 
chain in order to show rapid mixing. 

Let A = {1, N} and define Q = { — 1, 0, 1} A to be the set of all possible 3-colorings 
of A. Note, that we do not restrict ourselves to 3-colorings in the graph theoretic 
sense, where adjacent vertices are required to have different colors. Further consider 
a tuple (ai, a 2 , a%) G T, thus Nai represents the number of vertices, which have color 
i. Now let 

j 

be the set of appropriate 3-colorings and p the uniform distribution on C. Our aim 
is to give a Markov chain (Xj)i<=N which compares well to the chain we consider in 
Section 14.3.21 for the BEG model and which also samples efficiently from p. 

B.l. Rapid mixing of (Xi). Fix C as in (IB.ip . Consider the Markov chain (Xj) on 
C with the following transition kernel. Take (TZi(i)) i£ ^ and (7^. 2 (0)ieN independently 
and uniformly distributed on {1, N}. Define 

X l :=X EC 

1+1 ' tp^i),^*))^) K x (i)^K 2 (i) { - } 

(where X is any admissible starting point and for a vector x := (x x , . . . ,xn) and 
i j G {1, . . . , N} we write (i,j)(xi, . . .x^) for the vector x with the components i 
and j interchanged) and verify, that (X) has reversible distribution p on C. We will 
use a coupling argument in order to show rapid convergence to equilibrium of (Xi). 
To this end define 



X[ := X' G C 
with X' drawn according to p and iteratively 

C(i) := {j€{l,...,N}\X i (j)*X' i (j)} 

with 



(B.3) 
(B.4) 



xi n 1 (t) = n 2 (i) 

(7^(z),7^))(At) X^t)) = X^n^) A X t (K 2 (t)) ^ XKll^) 

(K^Kzd))^ X^ii)) ^X>(K x (i)) AX^i)) =X' i (n 2 (i)) 

725,(0) (X{) Xi(K x (i)) =X>(K 1 (i)) AX,(72 2 (z)) = X[(TZ 2 (i)) 

k (Jli(i), U 3 (i)) (XI) otherwise 

and TZs being uniformly drawn out of C(i) and independent of (lZ±(i)) and (1Z 2 (i)). 
Again verify that (X 4 ') is a Markov chain which is reversible with respect to p on C. 
Thus (Xj-) is in equilibrium in every step. 

Lemma B.l. The expected coupling time Tc of the Markov chains (Xj) and (XQ is 
bounded from above by 

ET C < N 4 . 

Proof. Define *&(i) '■= \C(i)\. Once *&(i) = the two chains have coupled. Due to the 
construction \I/ is monotonically decreasing. Indeed, if Xj(fc) = X[(k) holds for one i 
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and a k G {1, ... , X}, we will have Xj = Xj for the position k is permuted to. We 
further know 

p(*(z + i)<i-i|^)=j>o) 

as all that needs to happen is, find two components k\ and &2 such that the chains 
differ at both positions and the number of differences can be reduced by at least one 
through exchanging spins in one of the chains. Such k\ and k 2 always exist and we 
can choose these with and H.^ which happens with probability j^. In this case 
7?-3 would be drawn out of all components in which Xj and X- differ. There are at 
most X of those. Using [1, Chapter 4-3, Lemma 1] we get an upper bound of 

N 

ET C < ^ X 3 = X 4 

i=l 

for the coupling time. □ 

Appendix C. Existence of K low 

As [11] did not completely prove the existence of K\ ow := \im.p^ +00 K^\fi) , we will 
do so in this section. 

Lemma C.l. The function 

X«:(&,oo) — > R 
P ^ X«(/3) 

is continuous. 

Proof. As shown in [TTJ Kc((3 c ) — Kc 2 \(3 c ). It is also shown that is continuous 
and monotonically decreasing on its domain. 

Assume K c to not be continuous. Then there exists a (3d > (3 C such that either K^ 
is discontinuous at (3d = (3 C or such that xj is discontinuous at (3d and continuous 
for all (3 G [(3 c ,(3d). Then there exists a monotonic sequence with (3i ^ (3d, 

lim A = (3 d and limX^) ^ K^(f3 d ). 

(1) Suppose first that limXi^A) < K ( c\(3 d ). Fix K d G (lim KP {&) , tfP (#,))■ 
The analysis given by Ellis et al. in [11] guarantees the BEG state space for 
{Kd, (3d) to have exactly one macrostate while for all but finitely many i the 
BEG state space for (K d , /%) has exactly two modes. Have fp as defined in 
(I3.10p . It is smooth and clearly, for K = K d , we have the functional limit 

lim fa = f g .. 

Thus in this case fg d has either exactly one global maximum or exactly three 
global maxima. 

(2) The second case for limid 1 ^) > KP(/3 d ) works the same. 

□ 

Lemma C.2. The function 

:(/3 c ,oo) — ► E 

P ^ K?\/3) 

is monotonic. 

Proof. Assume Kc not to be monotonic. Then there exist (3\ < (32 < (3% < f3& such 
that X c (1) (/3 4 ) > K^ifa) = kP((3 3 ) > kP{/3 2 ) as X c (1) is continuous as shown in 
Lemma [C. 11 This guarantees that the BEG model has at least two phase transitions 
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for Kc (Pi). With the analysis done by Ellis et al. in [11] it is however clear where 
exactly the macrostates lie. Thus the first phase transition of the model must switch 
from one to two modes and the second the model exhibits for growing (3 must change 
back to exactly one mode. This is in clear violation of Lemma 14.21 of this paper. □ 

Corollary C.3. The proof given in Section 5 by Ellis et al. in [TT] is correct if K^p 
is inverted only on 1m.{K$t\f} c , oo)). 

Corollary C.4. The limit of K^\(3) as (3 — » +oo exists. 

Appendix D. Analysis of fp 

This appendix contains a detailed analysis of the function fp given in ( I3.10p . The 
first result we prove in this appendix is the Theorem 13.11 We first change coordinates. 
Let r = and t = x + z. Then the mapping is 

T : Too -> (0, l) 2 with (a_i, a , a x ) H> (r, t) 

bijective. Hence, instead of investigating the maxima of fp, we can analyze the 
minima of F(r, t) := Fp{r, t) := —fp o T _1 (r, t). Here F : (0, l) 2 — > R is given by 

F(r, t) = (3t(l - Kt(l - 2r) 2 ) + tH(r) + H{t), 

with H (r) = r logr + (1 — r) log(l — r). 

Minimums at the boundary: For fixed r G [0, 1] the function F is the sum of a 
polynomial in t and the entropy function H(t). Now H(t) is steep at t — and t — 1, 
hence there are no local minima in these points. 

If, on the other hand, t £ (0, 1) is fixed, the same argument yields that there are no 
local minima in r = and r = 1, either. 

Global and local Minimums: We take derivatives of F for r, t £ (0, 1). 

d r F(r,t)= A/3 Kt 2 (1 -2r)+t log 

d t F(r, t)= /3- 2(3Kt(l - 2r) 2 + #(r) + log ^ 

9 r 2 F(r,t)= -8/m 2 + ^ 

d 2 f F(r,t)= 8/m(l-2r) + lo gT ^ 

d 2 F(r, t) = -20K(1 - 2r) 2 + 



t(i-t) 

Hence the equations for potential minima are 

4(3Kt(2r - 1) = log — ^— (D.l) 
1 — r 

j-l = eP- y/r(l-r) , (D.2) 

where we have used (ID. II) to solve = and obtain flD.2j) . Taking the Taylor 
expansion of F in a critical point (r ,to) up to second order we see that 

F{r,t)=F{r ,t ) + ^A 

where 

A = 9 2 F(r ,t )(r - r ) 2 + 2d 2 rt F(r Q ,t ){r - r )(t - t ) + d 2 F(r ,t )(t - t ) 2 . 



Putting w := ^/r (l — r ) we see that t = (1 + e^w) 1 and therefore 



/2 



a r 2 F(r , t ) = + e p w - 8(3Kw 2 ). (D.3) 



w 

Due to fID.lj) we have in critical points (r ,to) 

^F(r ,t )=4/3^ (l-2r 
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and the determinant of the Hessian M in (ro, to) is given by 

det M = - 8pKti) ( to(1 1 _ to) - 2 ^( 1 - Aw2 )) ~ (4/?^o) 2 (l - ^ 2 ). 
This can be simplified to 



detM = (^--S/3Kt )-^—-2/3K%(l- 
\w z / 1 — to w 

1 - 2l3Kt + 2/3Kt 2 (l - Aw 2 ) 



Aw 2 ) 



w 2 (l-t ) 

and by replacing t we obtain: 

(1 + e^w) 2 - 2f3K(l + e^w) + 2f3K(l - Aw 2 ) 



detM 



w 2 (l-t )(l + ePw) 2 
1 + 2e^w(l - (3K) + w 2 (e 213 - 8fiK) 



w 2 (l-t )(l + ePwy 



(D.4) 



Note that the sign of det M is determined by the sign of the nominator, which is 
important, since M is positive definite in (rQ,t ), if d 2 F > and detM > in that 
point. 

Investigating which points are critical, we see the following 

(1) Obviously, r = § , t = ^ is critical. Here d 2 F(r ,t ) = 2^(2 + - A/3K) 
and hence 

A = 2t\{2 + e?- AI3K)(r - r ) 2 + w^x ft - ^o) 2 - 

Thus there is a local minimum of F in (Vq, to), if and only if Af3 K < 2 + e" . 
If A(3K > 2 + e^, (r , to) as defined above is not an extremal point. 

(2) For r 7^ |, we only consider r £ / := (~, 1), since F is symmetric in r around 
I 

2" 

Combining (ID.lj) and ( ID. 2}) we see that a necessary condition for (r, t) to 



be a local minimum is 



hf \ i r W(2r - 1) 

fc(r) := log = := <^(r), (D.5) 

1-r 1 + e /3 A/r(l - r) 



which we will investigate for solutions in J. Let w(r) := \/r(l — r). We 
compute 



1 1 / \ 1 1 1 



r 1 — r u> 2 (r) 

T/'f - - — 1 _ 2r-l 
(rj ~ r 2 + (1 - r) 2 ~ w A (r) 

and 

2 + 2e?w(r) - (2r - l^^? A w ( r ) + e P 



[l + ePw(r)) 2 ' w(r)(l + ePw(r)y 
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and eventually 

Aw'(r)w(r)(l + e^w(r)) 2 - (Aw(r) + e^)[w(r)(l + e^r)) 2 ]' 



</(r)= 2/3K- 



2f3Kw\r 
/3Ke 



w 2 (r)(l + ePw(r)) A 
4u>(r)(l + e p w(r)) - (Aw(r) + e /3 )[l + e p w{r) + 2w(r) 
w 2 (r)(l + e^w(r)) 3 
(2r - l)(8w 2 (r) + SePw{r) + 1) 



w 3 (r)(l + e /3 w(r)) 3 
Now h'(r)=tp'(r) implies 

; 2/3 - 8(3K)w 2 (r) + 2eP{\ - (3K)w(r) + 1=0. (D.6) 



< 
> 

Hence there are at most two solutions r 1; r 2 G / with (p' = h', because w 
is injective on /. Therefore, according to Rolle's theorem also the equation 
p = h has at most two further solutions in I (next to r = 1/2). Moreover, we 
see that the left hand side of flD.6j) equals the nominator of det M in (ID.4[) . 
In a critical point we thus have h! < tp' (or h' > ip', respectively) if and only 
if in this point it holds det M < (or det M > 0, respectively). 
Again we distinguish different cases: 

If A/3K > 2 + e' 3 , then ^'(1/2) > h'(l/2) and thus <p > h on (1/2, 1/2 + 5) 
for an appropriate 5 > 0. Now, close to r = 1 we always have <p < h, which 
means, there is at least one solution ip = h in I. However, there cannot be 
two such solutions: If there were \ < T\ < r 2 < 1 with ip = h, then <p — h 
cannot change sign in both solutions, otherwise we would have <p > h also in 
a right neighborhood of r 2 and we would need a third solution r% to the right 
of r 2 , in contradiction to the above conclusion. If, on the other hand, tp — h 
cannot change sign in both solutions, then at least one of r\ and r 2 also solves 
tp' = h'. But this again leads to a contradiction. Again using Rolle's theorem 
we see that ip = h for | < r\ < r 2 implies that there exist £i,£ 2 with p' = h' 
and 

1 c 

- < 6 < n < £ 2 < r 2 

and there cannot be more than two solutions of tp' = h'. 

Hence there is exactly one solution r± £ / and from (1D.2|) one obtains the 
corresponding t%, such that (ri, ti), (1 — ri, t\) and (r , to) are the only critical 
points of F. However, we already know that here we have 4(3K > 2 + e 13 and 
hence (ro,to) is n °t a minimum of F. Moreover, minima at the boundary do 
not exist. But F is continuous on [0, l] 2 , therefore has a minimum, thus the 
points (7*1, ij.) and (1 — ri,ti) are global minima. 

If, on the other hand 4/3 K = 2 + 6^ and > 4, then ^'(1/2) = /i'(l/2) and 
of course ^"(1/2) = /i"(l/2) = 0, however we still have ^'"(1/2) > h"'(l/2), 
hence again <p > h on (1/2,1/2 + 5) for an appropriate 5 > 0. ip"'(\/2) > 
h"'(l/2) can be seen as follows: Write 

, , 8m 2 +3ePu+l 

Then <£>"(r) = fiKe"(2r — l)v o w(r) and hence 

</'(r) = f3Ke l3 (2v o w(r) - (2r - l) 2 ^— 1»' o w(r)). (D.7) 

2w[r) 

Thus 

<p'"(l/2) = 1(2 + eP)ePv{\/2) =48- ' 



2 V ' w ' 2 + eP' 
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Due to h"'(l/2) = 32 we have </'(l/2) > h'"(l/2) if and only if > 4. 

Analogously to our arguments above we see that there is only one solution 
ri G / of ip = h, and again the corresponding t% can be computed from (1D.2[) . 
Indeed there is a local minimum of F in (ri,ti) and (1 — ri,t\). This can be 
seen by showing that the Hessian is positive definite. However, as this is not 
part of our assertion, we will refrain from doing so. 

If, finally A/3K = 2 + e? and e 13 < 4, then <p'(l/2) = h'(l/2) and (f'"(l/2) < 
h"'(l/2) and <^ {5) (l/2) < h^(l/2), such that again tp < h on (1/2, 1/2 + 5) 
for an appropriate 5 > 0. 

For ip^\l/2) < ft/ 5 ) (1/2) one argues: Because of (1D.7|) we have 

y? (5) (l/2) = 0Ke?(2(vow)"(l/2) -8v'(l/2)) 

= 2/3KeP ({{v 1 o w) ■ wj (1/2) - 4v'(l/2) 



and 



v'(l/2) 



2(3KeH (1/2) - 4«'(l/2) 

V w 

-12/3iTeV(l/2) 

(8 + 3e")±(2 + e?) - 3(1 + e*)(3 + 



(I + I e/3 ) 4 



2 + 36^ 

-320 



thus 



(2 + e/ 3 ) 3 ' 

2 + 36^ 



^ (5) (l/2) = 9606^- 



(2 + C 3 ) 2 ' 

Because of h^{l/2) = 4! • 2 6 one has y? (5) (l/2) < /i (5) (l/2) if and only if 
56^(2 + 3eP) < 8(2 + e?) 2 , thus 7e 2/3 - 226^ - 32 < and this is true for all 
< e 13 < 4. 

The same is of course also true, when 4/3 K < 2 + e 13 , since then we already 
have <p'(l/2) < h'(l/2). 

Summarizing we see that in all possible cases we have at most three local 
minima of F and none at the boundary. Of course, we could discuss how many 
minima there are exactly in certain cases. However, we will refrain from doing 
so, since this is not needed. 

The second result we prove in this appendix is needed for the slow convergence case. 
Lemma D.l. There exists an Eq > such that for any < e < 6q on the set 

M= {a\\S N (a)\ < N -e) 
as defined in f)5.3p . the free energy fp is unimodal for all j3. 

Proof. The claim is true, if we find an sq > such that fp((a_i, a Q , ai)) is unimodal 
on |a_i — ai| < e . Consider 

-fp(a x , a , at) = 2(3 ai + 2a x log(a x ) + (1 - 2a x ) log(l - 2a x ) (D.8) 

-/ /3 (a 1 ,a ,a 1 ) / = 2/3-log^-2^ (D.9) 

which tells us, that there is exactly one mode on the a\ = a_i, a = 1 — 2a± line. As 
fp is smooth this generalizes for all lines a± = a_i + 2eo for sufficiently small 6q. This 
yields the desired result by using Theorem 13.11 as all that could happen, are maxima 
on the boundary. □ 
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